Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework
نویسندگان
چکیده
Cross-domain learning targets at leveraging the knowledge from source domains to train accurate models for the test data from target domains with different but related data distributions. To tackle the challenge of data distribution difference in terms of raw features, previous works proposed to mine high-level concepts (e.g., word clusters) across data domains, which shows to be more appropriate for classification. However, all these works assume that the same set of concepts are shared in the source and target domains in spite that some distinct concepts may exist only in one of the data domains. Thus, we need a general framework, which can incorporate both shared and distinct concepts, for cross-domain classification. To this end, we develop a probabilistic model, by which both the shared and distinct concepts can be learned by the EM process which optimizes the data likelihood. To validate the effectiveness of this model we intentionally construct the classification tasks where the distinct concepts exist in the data domains. The systematic experiments demonstrate the superiority of our model over all compared baselines, especially on those much more challenging tasks.
منابع مشابه
Monolingual and Cross-Lingual Probabilistic Topic Models and Their Applications in Information Retrieval
Probabilistic topic models are a group of unsupervised generative machine learning models that can be effectively trained on large text collections. They model document content as a two-step generation process, i.e., documents are observed as mixtures of latent topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested int...
متن کاملA COMMON FRAMEWORK FOR LATTICE-VALUED, PROBABILISTIC AND APPROACH UNIFORM (CONVERGENCE) SPACES
We develop a general framework for various lattice-valued, probabilistic and approach uniform convergence spaces. To this end, we use the concept of $s$-stratified $LM$-filter, where $L$ and $M$ are suitable frames. A stratified $LMN$-uniform convergence tower is then a family of structures indexed by a quantale $N$. For different choices of $L,M$ and $N$ we obtain the lattice-valued, probabili...
متن کاملIncremental Clustering for the Classification of Concept-Drifting Data Streams
Concept drift is a common phenomenon in streaming data environments and constitutes an interesting challenge for researchers in the machine learning and data mining community. This paper proposes a probabilistic representation model for data stream classification and investigates the use of incremental clustering algorithms in order to identify and adapt to concept drift. An experimental study ...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملLearning to Adapt Across Multimedia Domains
In multimedia, machine learning techniques are often applied to build models to map low-level feature vectors into semantic labels. As data such as images and videos come from a variety of domains (e.g., genres, sources) with different distributions, there is a benefit of adapting models trained from one domain to other domains in terms of improving performance and reducing computational and hu...
متن کامل